home *** CD-ROM | disk | FTP | other *** search
-
-
-
- regexp(n) Tcl Built-In Commands
-
-
-
- _________________________________________________________________
-
- NAME
- regexp - Match a regular expression against a string
-
- SYNOPSIS
- regexp ?_s_w_i_t_c_h_e_s? _e_x_p _s_t_r_i_n_g ?_m_a_t_c_h_V_a_r? ?_s_u_b_M_a_t_c_h_V_a_r _s_u_b_-
- _M_a_t_c_h_V_a_r ...?
- _________________________________________________________________
-
-
- DESCRIPTION
- Determines whether the regular expression _e_x_p matches part
- or all of _s_t_r_i_n_g and returns 1 if it does, 0 if it doesn't.
-
- If additional arguments are specified after _s_t_r_i_n_g then they
- are treated as the names of variables in which to return
- information about which part(s) of _s_t_r_i_n_g matched _e_x_p.
- _M_a_t_c_h_V_a_r will be set to the range of _s_t_r_i_n_g that matched all
- of _e_x_p. The first _s_u_b_M_a_t_c_h_V_a_r will contain the characters
- in _s_t_r_i_n_g that matched the leftmost parenthesized subexpres-
- sion within _e_x_p, the next _s_u_b_M_a_t_c_h_V_a_r will contain the char-
- acters that matched the next parenthesized subexpression to
- the right in _e_x_p, and so on.
-
- If the initial arguments to regexp start with - then they |
- are treated as switches. The following switches are |
- currently supported: |
-
- -nocase ||
- Causes upper-case characters in _s_t_r_i_n_g to be |
- treated as lower case during the matching process. |
-
- -indices ||
- Changes what is stored in the _s_u_b_M_a_t_c_h_V_a_rs. |
- Instead of storing the matching characters from |
- string, each variable will contain a list of two |
- decimal strings giving the indices in _s_t_r_i_n_g of |
- the first and last characters in the matching |
- range of characters. |
-
- -- ||
- Marks the end of switches. The argument following |
- this one will be treated as _e_x_p even if it starts |
- with a -.
-
- If there are more _s_u_b_M_a_t_c_h_V_a_r's than parenthesized subex-
- pressions within _e_x_p, or if a particular subexpression in
- _e_x_p doesn't match the string (e.g. because it was in a por-
- tion of the expression that wasn't matched), then the
- corresponding _s_u_b_M_a_t_c_h_V_a_r will be set to ``-1 -1'' if
- -indices has been specified or to an empty string otherwise.
-
-
-
- Tcl 1
-
-
-
-
-
-
- regexp(n) Tcl Built-In Commands
-
-
-
- REGULAR EXPRESSIONS
- Regular expressions are implemented using Henry Spencer's
- package (thanks, Henry!), and much of the description of
- regular expressions below is copied verbatim from his manual
- entry.
-
- A regular expression is zero or more _b_r_a_n_c_h_e_s, separated by
- ``|''. It matches anything that matches one of the
- branches.
-
- A branch is zero or more _p_i_e_c_e_s, concatenated. It matches a
- match for the first, followed by a match for the second,
- etc.
-
- A piece is an _a_t_o_m possibly followed by ``*'', ``+'', or
- ``?''. An atom followed by ``*'' matches a sequence of 0 or
- more matches of the atom. An atom followed by ``+'' matches
- a sequence of 1 or more matches of the atom. An atom fol-
- lowed by ``?'' matches a match of the atom, or the null
- string.
-
- An atom is a regular expression in parentheses (matching a
- match for the regular expression), a _r_a_n_g_e (see below),
- ``.'' (matching any single character), ``^'' (matching the
- null string at the beginning of the input string), ``$''
- (matching the null string at the end of the input string), a
- ``\'' followed by a single character (matching that charac-
- ter), or a single character with no other significance
- (matching that character).
-
- A _r_a_n_g_e is a sequence of characters enclosed in ``[]''. It
- normally matches any single character from the sequence. If
- the sequence begins with ``^'', it matches any single char-
- acter _n_o_t from the rest of the sequence. If two characters
- in the sequence are separated by ``-'', this is shorthand
- for the full list of ASCII characters between them (e.g.
- ``[0-9]'' matches any decimal digit). To include a literal
- ``]'' in the sequence, make it the first character (follow-
- ing a possible ``^''). To include a literal ``-'', make it
- the first or last character.
-
-
- CHOOSING AMONG ALTERNATIVE MATCHES
- In general there may be more than one way to match a regular
- expression to an input string. For example, consider the
- command
-
- regexp (a*)b* aabaaabb x y
-
- Considering only the rules given so far, x and y could end
- up with the values aabb and aa, aaab and aaa, ab and a, or
- any of several other combinations. To resolve this poten-
- tial ambiguity regexp chooses among alternatives using the
-
-
-
- Tcl 2
-
-
-
-
-
- regexp(n) Tcl Built-In Commands
-
-
-
- rule ``first then longest''. In other words, it consders
- the possible matches in order working from left to right
- across the input string and the pattern, and it attempts to
- match longer pieces of the input string before shorter ones.
- More specifically, the following rules apply in decreasing
- order of priority:
-
- [1] If a regular expression could match two different parts
- of an input string then it will match the one that
- begins earliest.
-
- [2] If a regular expression contains | operators then the
- leftmost matching sub-expression is chosen.
-
- [3] In *, +, and ? constructs, longer matches are chosen in
- preference to shorter ones.
-
- [4] In sequences of expression components the components
- are considered from left to right.
-
- In the example from above, (a*)b* matches aab: the (a*)
- portion of the pattern is matched first and it consumes the
- leading aa; then the b* portion of the pattern consumes the
- next b. Or, consider the following example:
-
- regexp (ab|a)(b*)c abc x y z
-
- After this command x will be abc, y will be ab, and z will
- be an empty string. Rule 4 specifies that (ab|a) gets first
- shot at the input string and Rule 2 specifies that the ab
- sub-expression is checked before the a sub-expression. Thus
- the b has already been claimed before the (b*) component is
- checked and (b*) must match an empty string.
-
-
- KEYWORDS
- match, regular expression, string
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Tcl 3
-
-
-
-